Value-added models

In 1966, James Coleman published a report entitled “Equality of Educational Opportunity.” After studying over 150,000 students, Coleman and his colleagues found that student background and socio-economic status had far more influence on educational outcomes than did widely measured differences in school resources. This conclusion has since been widely replicated.

Yet over forty years after the Coleman report, the resource-based view of school quality is just starting to lose its grip. For example, “highly qualified teachers” are typically defined in terms of degrees, certifications, courses taken, and years of experience. Although these factors have the advantage of being easily measurable, they are generally found to have little effect on student achievement gains.

One response to the Coleman findings is that they allow schools to avoid responsibility for student achievement. If students do poorly, in this view, blame environment, poverty, and family.

Another response to these issues has been the development of so-called “value-added” models. While these models can take various forms, the basic approach is to take data on student achievement and run regression (or regression-like) models against various demographic factors that might influence student achievement.

The resulting models can then be used to predict expected average student achievement for, say, a school based on that school's demographics. If the predicted and actual outcomes diverge, the reasons for the divergence could be explored. For example, if a school consistently overperforms its predicted achievement, perhaps the school has found practices that help increase achievement and that other schools could emulate.

One version of a value-added model uses publicly available data on schools, which is often published by grade level. Typically this includes average scores on each test given, the average poverty level (generally measured by the percentage of students qualifying for free or reduced lunch), student mobility rate, and others. In some cases these can be broken out by ethnic group and gender.

Here is an example of such a model I developed some years ago. Such a model can be used to identify schools with stronger or weaker outcomes. It can also be useful assessing the effectiveness of a particular program so long as that program is only introduced into a limited number of schools, allowing the others as a comparison.

A follow-up study used the model to assess whether schools adopting one of three new programs changed their average ratings. The table below summarizes the results.

Program	Small class size (SAGE)	Direct Instruction	Curriculum alignment
Result	No significant effect	Significant improvement	Significant improvement

In addition to applying this type of model to school data within a particular school district, this approach could also be used for data reported to the US Department of Education in its core of common data. See our discussion of the use of NAEP test data to compare charters with traditional public schools.

Today, a more common version of value-added models incorporates information on the performance of individual students over a sequence of tests. This, of course, requires the ability to track individual students over time. Typically these models include information on student demographics as well as previous test scores. This information is used to predict how the students are expected to do on the post-test.

Depending on the model, one issue to consider is the effect of each level: student, classroom, and school. The performance of students may depend partly on the characteristics of that student, but also on the class and the school, and be related to that of other students in the class and school. This has led to the development of “hierarchical” models, that try to separate school effects from class effects and from individual student variability.

Value-added models have the potential to help make better decisions in several areas, notably:

Teacher preparation programs. Do graduates of some education schools get better student growth than those of others? Assuming the schools want to improve the effectiveness and job satisfaction of their graduates, they should find such information useful.
Interventions. Identifying which educational programs work and which do not, starting with those for reading and mathematics.
School leadership. Evaluating the effectiveness of principals. Much of the success of a school reflects its leadership.
Teachers Given the inability of traditional measures to predict student achievement, value-added measures to enrich teacher evaluations are probably coming regardless of opposition from unions or the education establishment.

The last of these is by far the most controversial use of value-added measures. A briefing paper from the Economic Policy Institute and an article published by the American Mathematical Society summarize the arguments against using value-added measures for teacher evaluation. Some concerns raised include:

A lack of stability from one year to the next, so that teachers may randomly move between effectiveness ratings. In one study, the authors developed a Value-Added Model and then found it to be unstable in its ratings. At this point I am not aware of any study of stability in the models that are proposed to be used.
That many teachers do not teach math and reading, the subjects most likely to be tested annually.
That the demographic data commonly available may not fully capture student differences. For example, some children of immigrants may come from families which, although poor, put a high priority on education.
Finally there is a fear that pressure to raise student test scores will create counter-productive results, including cheating or time spent on activities aimed at raising test scores without increasing the underlying skills.

While these concerns raise valid issues, one question to ask is what are the alternatives. In the past, schools were often rated on their average test scores, with no adjustment for the challenges students faced. Teachers commonly were given no meaningful evaluations.

Value-Added Models